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Abstract 

Multiple sclerosis (MS) is a chronic autoimmune disease that affects the central nervous system. The progression and severity 
t— I of MS varies by individual, but it is generally a disabling disease. Although medications have been developed to slow the disease 

progression and help manage symptoms, MS research has yet to result in a cure. Early diagnosis and treatment of the disease have 
been shown to be effective at slowing the development of disabilities. However, early MS diagnosis is difficult because symptoms 
are intermittent and shared with other diseases. Thus most previous works have focused on uncovering the risk factors associated 
with MS and predicting the progression of disease after a diagnosis rather than disease prediction. This paper investigates the use 
of data available in electronic medical records (EMRs) to create a risk prediction model; thereby helping clinicians perform the 
difficult task of diagnosing an MS patient. Our results demonstrate that even given a limited time window of patient data, one 
can achieve reasonable classification with an area under the receiver operating characteristic curve of 0.724. By restricting our 
features to common EMR components, the developed models also generalize to other healthcare systems. 



I. Introduction 



in 
< 

Multiple sclerosis (MS) is a chronic, progressive, and incurable autoimmune disease. Inflammation damages the myelin 
^ sheath, the protective coating of nerve cells, and causes signal disruption in the brain and spinal cord. The deterioration of 
c/3 nerve cells eventually becomes irreversible and leads to the development of disabilities. At least 1.3 million people worldwide 
are afflicted with MS with an average onset age of 29 years The incidence and prevalence rates vary amongst countries 
T — I but remains a global problem (TJ. Currently no cure exists for MS, but medications can help manage the symptoms, modify 
J> the disease course, and enhance the lifestyle of MS patients. Clinical trials have provided evidence that early diagnosis and 
treatment can slow the progression of MS, delaying the development of disabilities O, Q. Thus accurate identification of 
patients with high risk of developing MS is crucial to limiting the disease activity and prolonging a 'normal' patient lifestyle. 

Early MS diagnosis is a difficult problem as it lacks a single diagnostic test and common clinical features are shared with 
other diseases. Neurologists rely primarily on either the Poser or McDonald diagnostic criteria to classify the disease. The 
£^ Poser criteria separates MS into four groups based on attacks, clinical evidence, and paraclinical evidence [4|. The McDonald 
diagnostic criteria, developed in 2001, leverages advancements in magnetic resonance imagining (MRI) techniques to facilitate 
y—{ diagnosis in typical clinical presentations [5 1. Recent modifications to McDonald criteria improve the classification applicability 
to pediatric, Asian, and Latin American communities (6). Nonetheless, a neurologist still relies on performing an exclusion 
• i-h diagnosis in conjunction with the patient's symptoms and medical history. 

The advent of electronic medical records (EMRs) has increased the availability of medical data. Consequently, data mining 
^ and machine learning techniques have been used to develop clinical decision support systems to aid medical professionals. The 
problem of identifying patients with high risk of MS is a prime candidate for using EMRs to develop a data-driven prediction 
model. This paper investigates the feasibility and performance of a predictive disease model based on existing EMRs. Although 
our work is limited to patient data over a 7-year period, we establish a sparse baseline risk prediction model and demonstrate 
reasonable classification accuracy. 

II. Background and Related Work 

The exact nature and cause of MS is still unknown. Epidemiology studies have focused on discovering the variables that 
influence the development of MS. Prior research has identified genetic, environmental, and comorbidity risk factors that affect 
the disease incidence rates. These variables have been used to build models to predict the diagnosis and progression of the 
disease. 



A. Risk factors 

Genetic susceptibility to MS has been supported by the following risk factors: race, gender, and family history. Genetic 
epidemiology studies have demonstrated a rise in disease risk when a family member is affected with MS Q, E, |9), iflOl . 
The increase in risk is correlated with the degree of kinship |7|, (8), J9), [ 10 1 . Furthermore, the familial implications may also 
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pertain to other autoimmune diseases (9), IfTTl . IfTOl . An individual's race, which is genetically determined, also factors into 
the development of the disease. Certain races, such as Asian, Native American, and African American, are less susceptible |7), 
IfTOl . Additionally, MS predominantly afflicts females and exhibit an early onset of the disease than their male counterparts fTZl . 



The place of residence during a patient's formative years is one of the environmental factors in the development of MS. 
Studies have shown that immigrants migrating before adolescence acquire the risk associated with their new residence while 
immigrants moving after adolescence retain their risk of their original residence fl3l . lfl"4l . IfTUl . The effect of latitude and 
hygiene hypothesis may account for the geographic variations beyond genetic factors. 

The latitude gradient, a rise in incidence and prevalance rates with an increase in latitude, was previously a prominent 
MS feature but has declined in recent years. Sunlight duration, sunlight intensity, and vitamin D levels have been proposed 
as potential explanations for this phenomenon lfl3l . |[T5l . 1161 . IfTTl . IfTOl . The seasonal vulnerability also demonstrates the 
importance of sun exposure as a patient is most vulnerable during the winter fl3l . However, there are notable exceptions to 
the latitude theory in costal regions of Norway and Japan where a high consumption of fish dampens the lack of sunlight 
exposure E3, IfTUl . 



The hygiene hypothesis postulated that early exposure to various infectious agents protects the patient against risk of MS 1161 . 
IfTTl . IfTOl . One infection in particular, Epstein Barr Virus (EBV), has been heavily associated with MS. Individuals with high 
anti-EBV antibodies have an increased risk of MS [17]. Additionally, contracting EBV at a later age also increases the likelihood 
of developing MS ifTBI . IfTTl . ifTOl . Other strains of viruses and infections, such as human herpesvirus 6 (HHV6) and Chlamydia 
pneumoniae, have been proposed but lack sufficient evidence to support a casual effect on disease risk |15|, IfTTl . 

One consequence of the hygiene hypothesis is the relationship between vaccinations and the susceptibility to MS. Countries 
with higher hygiene standards generally mandate vaccine immunizations to reduce the number of infections. However, during 
the late 1990s, concerns grew over the hepatitis B vaccine increasing the risk for MS [18|. Although subsequent studies lfT9l . 
If20l failed to find a significant correlation between the vaccine and the development of MS, the hypothesis that vaccinations 
may influence the development of the disease should not be dismissed ifTBI . 

An individual's lifestyle, through diet and smoking habits, also factors into the disease risk. Individuals who consume 
non-marine meat have higher risk of developing the disease IfTTl . However, fish and seafood consumption protects against 
MS 02), lfT6l . IfTTl . [ 1 1 . Marine life has a higher concentration of polyunsaturated fatty acids and antioxidants, which has 
anti-inflammatory properties that help suppress the disease process lfl"3l . IfTBI . High consumption of saturated fatty acids during 
one's childhood may cause adolescent obesity, which is associated with an increased risk of MS IfTOl . Additionally, cigarette 
smoking has been shown to influence the development of MS lfT6l . IfTTl . IfTOl . 

Shifts in an individual's hormone levels have also been suggested as factors in the disease process. A decrease in the number 
of MS relapses during pregnancy suggests the transient benefits of higher levels of estrogen ifTSl . fl2l . A study on British 
women showed that the recent use of oral contraceptives reduced the risk of MS [21 J. However, a subsequent US study lf22ll 
was unable to obtain evidence that supported the benefits of oral contraceptives. 

Other autoimmune disorders and specific cancers have been proposed as potential comorbidities to MS. In a paper that 
summarized the environmental features researched in etiological research on MS IfTTl . Lauer noted that inflammatory bowel 
disease (IBD), ulcerative colitis, and Type 1 diabetes have the strongest correlations to MS amongst the various autoimmune 
disorders. The paper also referenced potential associations with Hodgkin's, oral, and colon cancers with the caveat that there 
was insufficient evidence to support these connections. 

B. Predictive models 

Predictive studies have primarily focused on the progression of the disease. Bergamaschi et. al ll23l identified clinical features 
that could help predict the onset of secondary progression, defined by an increase in the Kurtzke's Expanded Disability Status 
Scale (EDSS), using patient data collected in the first year of the disease. The factors discovered in the study were then used 
to propose a Bayesian Risk Estimate for Multiple Sclerosis (BREMS) score to predict the risk of reaching the secondary 
progression [24|. A recent study suggested the use of EDSS ranking to identify patients at risk for high progression rates 5 
years from the onset of the disease ll25l . 

Scoring systems have also been developed to assess the risk of disability. A study showed that MS Functional Composite, 
originally proposed as a clinical outcome measure, could be used to determine risk of severe physical disability [26 1. The Mag- 
netic Resonance Disease Severity Scale (MRDSS) combined MRI measures into a composite score to predict the progression 
of physical disabilities [27 1 . Bazelier ll28l derived a score using Cox proportional hazard models to estimate the long-term risk 
of osteoporotic and hip fractures in MS patients. Another study conducted by Margaritella et. al [29 1 used Evoked Potentials 
score to predict the progression of disability and identify patients with benign MS. 

Limited research has been done with regards to predicting the risk of developing MS. One work predicted MS in patients 
with mono symptomatic optic neuritis using MRI examination findings, oligoclonal bands in cerebrospinal fluid (CSF), 
immunoglobulin (Ig) G index, and the seasonal time of onset lf30l . Thrower BP suggested the use of clinical characteristics of 
optic neuritis and traverse myelitis to identify high-risk MS patients. More recently, De Jager et. al [ 32 1 proposed a weighted 
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TABLE I: The list of MS features and their associated categories. The order of feature introduction is denoted by the number 
next to the category name. 



Demographics (1) 
Gender 
Ethnicity 
Race 

Age 

Family History (I) 
"US 
Mental Illness 
Colon Cancer 
Breast Cancer 
Lupus 
Thyroiditis 
Diabetes 

Inflammatory Bowel 



Autoimmune (2) 
Inflammatory Bowel 
Celiac 
Uveitis 
Thyroiditis 
Lupus 

Rheumatoid arthritis 
Sjoren's syndrome 
Bell's palsy 
Guillain Barre 
Diabetes 

Vitamin D deficiency 



Microbial (3) 



Measles, mumps, rubella 
Epstein Ban' Virus 



Mental Illness (4) 

Bipolar 

Schizophrenia 



Cancer (5) 

Lymphoma 

Oral 

Breast 

Colon 



Vaccine (6) 
Hepatitis (A+B) 
Diphteria, tetanus, pertussis 
Polio 
Influenza 

Measles, mumps, and rubella 
Varicella (chicken pox) 
Meningococcal 
Pneumococcal 

Haemophilus influenzae type b 
Human Papillomavirus 



Reproductive (7) 
Hysterectomy 
Oral contraceptive pills 
Estrogen replacement therapy 



MRI Scans + Obesity (8) 
Obesity 

Abnormal brain MRI 
Brain MRI 
Cervical spine MRI 
Thoracic spine MRI 



Blood Tests (9) 
Erythrocyte sedimintation rate 
Lyme 
B12 

ANA panel 

Anti-cardiolipin antibody 
Zinc 

Cerebrospinal fluid exam 



genetic risk score (wGRS) based on genetic susceptibility loci in the context of environmental risk factors. However, prior 
research relies on specialized measurements that are performed to confirm a MS diagnosis. The approaches suggested do not 
generalize to all patients and fail to allow for early diagnosis and intervention of high-risk MS patients. 

III. Materials and Methods 

A. Data 

Our retrospective study used de-identified patient data from the NorthShore Enterprise Data Warehouse (EDW). The data 
was collected from January 2006 to July 2012 and contained information pertaining to demographics, medications, medical 
encounters and procedures. 

The study examined adults (> 18 years of age) with complete demographic data (age, gender, ethnicity, and race). Any 
individual diagnosed with an MS ICD-9 code ("340") during a Neurology office visit was selected as a case patient. 1,456 case 
patients were identified in the NorthShore EDW. However, only 737 of the patients had recorded electronic medical encounters 
prior to the initial diagnosis date. For each of the 737 case patients, four control subjects with matching age and gender were 
selected from the general population for a total of 3,685 patients. 

B. Predictor Variables 

A comprehensive list of potential features was curated from prior MS research, detailed in section The list was also 
expanded to include common vaccinations, cancers, mental illnesses, and autoimmune diseases. Unfortunately some of the 
variables, such as lifestyle factors (smoking and alcohol use) and diet were unrecorded in the EMR. In addition, some diseases 
were also excluded because none of the patients received the particular medical diagnosis during the study period. Table [1] 
enumerates the features used in our retrospective study. 

The initial MS diagnosis date is used to define t for case patients. Since control patients did not have a MS diagnosis, to 
is a randomly selected from the patient's encounter date. Figure [T] shows the frequency of the number of encounters with the 
same date and location prior to to. Case patients generally have a low number of previous encounters while there is a more 
even distribution amongst the control patients. An alternative option was to use the same to for matching control patients. 
However, this exacerbated the discrepancy in the number of encounters prior to to between case and control. Thus, a random 
encounter date was used to define to for control patients. 

All data, except those related to family history, obtained after to were discarded to limit the potential effects of confounding 
factors. Family medical history spanned the entire study period because collection time is unimportant. A patient may fail to 
disclose all the family history in the first few medical encounters but reveals the information in later encounters after being 
diagnosed with a certain disease. 
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Fig. 1: Histogram of the number of encounters before to 



Binary values were used to indicate the presence of a particular medical diagnosis (ICD-9 code) found in a patient's encounter 
data prior to i . The vaccination, reproductive, and family medical history was extracted in a similar fashion, denoting the 
existing of specific supplemental classification codes (ICD-9 V codes). MRI scan features, except for an abnormal MRI result 
which was extracted via an ICD-9 code, signified the presence of specific medical procedure requests. Given the sparsity of 
numeric blood tests results, the feature was converted to three levels: (1) Unobserved, (2) Observed-Normal, and (3) Observed- 
Abnormal based on the ranges provided by the EDW. The entire feature set was represented using a binary matrix, where 
categorical variables were converted to dummy variables. 

C. Model Development 

Multivariate logistic regression models were used to predict a MS diagnosis at the next encounter. The selection of logistic 
regression model was motivated by the popularity of the model in the medical community, the simplicity of the model, and the 
interpretability of the results. To evaluate the effect on accuracy of specific predictor categories, new features were introduced 
in the order defined by Table [I] The first feature set contained only demographic and family history data, and was designed to 
mimic the information available at the first office visit. The last feature set contains all the predictor variables listed in Table 
[I] For each feature set, two sets of logistic regression models were trained: (1) forward stepwise model selection by Akaike 
information criterion (AIC) and (2) backward stepwise model selection by AIC. Stepwise model selection by AIC is used 
to minimize the model complexity, or encourage a sparser feature representation, without sacrificing predictive performance. 
10-fold cross validation was used to estimate the accuracy of each model. 

IV. Results 

A. Feature Set Comparisons 

Figure [2] shows the area under the receiver operating characteristic curve (AUC) for each feature set and model selection. 
Both feature selection methods result in similar predictive performance. Given only data that is available at the first office 
visit (demographic and family history), the forward selection model with an AUC of 0.538 ± 0.016 marginally outperforms 
random guess. Feature set 2, an expansion of the features to include autoimmune disorder diagnoses, increases the AUC by 
0.072. The performance then remains stagnant with the addition of the microbial, mental illness, and cancer feature categories. 
However, vaccinations, MRI scans, and blood test results boosted the predictive performance. Using all the available features 
(feature set 9), the forward and backward feature selection models predict an MS diagnosis at the next visit with an AUC of 
0.724 ± 0.033 and 0.718 ± 0.030 respectively. 

Stepwise feature selection using AIC produces a model using a sparse set of features. Figure [3] displays the comparison for 
the number of selected variables per feature set. The number of features selected with backward stepwise regression remains 
fairly constant for each feature set. This suggests the later categories (blood test results and MRI scans) are more informative. 
In addition, for feature sets 7-9, the backward selection method results in a sparse set of features. Both selection models on 
feature set 9 select less than 15% of the potential features. 

Figure [4] compares the joint predicted probabilities of consecutive feature sets for case patients and illustrates the effects 
of adding specific feature categories. Some of the transitions have been omitted since they are similar to the first feature set 
transition (1— >2). The addition of autoimmune disease diagnoses (1— >2) generally increases the predicted risk. The trend is 
most noticeable in the transition plot from feature 7 to 8, where the points lie predominately above the dotted line. Inclusion 
of blood test results (8— >9) marginally improves the predicted risk but it also decreases a substantial portion of patients with 
high risk in the previous model. 
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Fig. 2: A comparison of the AUC as categories of variables are added to the feature set. 



20 
15 
10 

CD 

"5 5 



^20 

-Q 

E 15 

13 
Z 

10 
5 


123456789 

Feature Set 

Fig. 3: A box plot of the number of variables selected per feature set and selection method. 

A joint density estimate of two consecutive feature sets for the control patients is demonstrated in Figure [5] For the first 
transition (1—^2), the improvement to predictive performance can also be traced to the decrease in the predicted risk of low 
risk MS patients. In addition, the figure illustrates that as the feature is expanded, the density slowly shifts away from the top 
right corner to the bottom half of the plot. The transition from feature set 8 to 9 shows that the predicted risk is distributed 
more evenly for the control patients compared to the first feature set, where the probabilities are lie around 0.21. 

Figures [4] and [5] show the dispersion of risk probabilities as more features are introduced. The addition of features related 
to MRI scans, or feature set 8, improves the separation between case and control through higher predicted probabilities for 
high-risk patients. The AUC improvement obtained from the inclusion of blood test results, feature set 9, can be primarily 
attributed to the decrease in predicted risk of control patients. These figures provide a graphical analysis of the benefits of 
specific feature categories. 
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B. Feature Set 9 Results 

We further focus on the model with the best predictive performance, the forward selection model trained on all the features 
(feature set 9). Table [il] summarizes the variables for which the magnitude of the coefficient is larger than 1 in the majority of 
the folds. The table also displays the number of case and control patients with the feature, the odds ratio, and the p-value from 
a chi-square test to determine the significance of the variable. If we use p-value as an initial filter with a=0.05, the following 
features would be eliminated: EBV; Bell's Palsy; colon cancer; family history of mental illness, MS, and inflammatory bowel 
disease; varicella vaccine, schizophrenia, and the Haemophilus influenzae type b vaccine. However, the selection of some of 
these variables, such as history of mental illness and varicella vaccine, is surprising given the lack of sufficient evidence in 
prior work to support their effect on the development of MS. 

Figure [6] shows the distribution of predicted risk values. The figure shows that even with all the features listed in Table 
[I] there is still a considerable overlap of predicted probabilities between case and control patients. Better separation between 
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Fig. 4: A scatterplot comparison of the predicted probabilities between consecutive feature sets for 737 case patients. The 
dotted line signifies no change in predicted risk. 




Fig. 5: A 2D kernel density estimate of the previous feature set and next feature set predicted probabilities for 2,948 control 
patients. 



these two classes can improve the risk prediction accuracy. The plot suggests incorporation of additional diagnoses or temporal 
aspects of existing diagnoses may be necessary to improve model performance. 

Figure [7] contains the performance plots for the forward selection models trained on feature set 1 and feature set 9. Figure 



7(a) demonstrates the noticeable improvement using all the available features. Additionally, the model trained on feature set 



1, demographics and family history features, barely outperforms random chance. The tradeoff between sensitivity, specificity, 
and positive predictive value can be seen in Figure 7(b) Feature set 9 has a higher intersection between the sensitivity and 



specificity curves, which is summarized in Table III In addition, the full-featured model generally achieves a better positive 
predictive value for all threshold values. However, the positive predictive value and sensitivity curves cross at the value ~ 0.40. 
At this point, we can accurately diagnose 40% of the case patients, but only 2 out of every 5 patients predicted to have a high 
risk of MS will be diagnosed with MS at the next office visit, a high number of false positives. 



C. Discussion 

The results demonstrate reasonable predictive accuracy using all the available features. One potential hindrance lies in the 
current feature construction. As Figure [T] shows, there are a limited number of encounters prior to to for case patients. Thus, it 
is difficult to determine whether an unobserved diagnosis may be due to the lack of longitudinal data (the patient was diagnosed 
prior to the study period). Additionally, certain diagnoses, such as EBV, can only be verified through culture samples which 
are not performed for every patient. 

Another limitation of our study is the reliance on ICD-9 and procedure codes. A patient may exhibit all the clinical symptoms 
for a specific disease but it is not present in the encounter data because the disorder has not been diagnosed. The ambiguity 
of ICD9-codes and diagnostic discrepancies between medical doctors can also impact our feature construction. Moreover, the 
blood test results' conversion to a categorical feature may be inaccurate as the testing protocol may have changed during the 
study window. Therefore, a patient's feature vector may not accurately reflect their medical history. 
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Feature Beta Case Control Odds ratio p-value 



Presence CSF oligoclonal bands 


16.255±0.545 


27 





oo 


0.000 


Mental Illness (FH) 


6.298±3.101 


3 


1 


3 


0.033 


EBV 


3.974±3.924 


3 


2 


1.5 


0.093 


Abnormal brain MRI 


2.877±0.313 


10 


4 


2.5 


0.000 


Unobserved B12 


2.527±0.149 


674 


2619 


0.257 


0.047 


Obs-Normal B12 


2.375±0.175 


54 


162 


0.333 


0.071 


Obs-Normal ANA SSB 


1.141 ±0.269 


57 


95 


0.6 


0.000 


Bell's Palsy 


1.889±0.295 


2 


3 


0.667 


0.576 


Diabetes 


-1.036±0.078 


19 


224 


0.085 


0.000 


Obs-Normal ANA DS 


-1.066±0.122 


44 


78 


0.564 


0.000 


Oral contraceptive 


-1.244±0.205 


2 


35 


0.057 


0.043 


DTP vaccine 


-1.829±0.126 


12 


327 


0.037 


0.000 


Unobserved Lyme Test 


-1.980±0.099 


692 


2924 


0.237 


0.000 


Colon cancer 


-2.927±4.072 


2 


15 


0.133 


0.584 


Asian race 


-2.925±0.234 


2 


93 


0.022 


0.000 


MS (FH) 


-3.356±4.307 


2 


19 


0.105 


0.023 


Unobserved CSF IGG synthesis 


-4.841±0.296 


700 


2946 


0.238 


0.000 


Varicella vaccine 


-15.161±0.087 





13 





0.145 


HPV vaccine 


-15.728±2.188 





82 





0.000 


Schizophrenia 


-15.763±0.369 





10 





0.235 


Estrogen replacement 


-15.823±0.209 





22 





0.037 


IBD (FH) 


-17.236±0.420 





3 





0.885 


HIB vaccine 


-18.156±0.521 





2 





1.000 



TABLE II: Mean coefficient values, odds ratio, and p-value for variables picked in at least 5 of the 10 folds with |/3| > 1 




Fig. 6: Box plot of the predicted probabilities using a forward selection model on all the features. 



Our study also suggests incorporating additional features. Given that some of the variables were unrecorded in the structured 
portion of the EMR, parsing through the clinical notes could result in information regarding lifestyle factors, diet, detailed 
family and medical history. In addition, temporal aspects of the medical diagnoses were not included in our feature set since 
the data was confined to medical encounters over a 6-year period. 



Feature Set 


Cutoff 


Sensitivity 


Sensitivity 


PPV 


1 


0.212 


0.528 


0.528 


0.218 


9 


0.241 


0.647 


0.647 


0.314 



TABLE III: The intersection of the sensitivity and specificity curve from Figure 7(b) 
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False Positive Rate Threshold 

(a) ROC curves compared to random assignment (b) Sensitivity, specificity, and positive predictive value as function of threshold 

Fig. 7: Model performance plots for feature sets 1 and 9. 



V. Conclusion 

This paper presented a risk prediction model from EMRs to help address the difficulty of early diagnosis in MS patients. A 
sparse set of features were selected to minimize model complexity while maintaining reasonable predictive performance. Our 
results show we are able to help identify patients at high-risk of developing MS, in spite of a limited sample of patient data. 
In addition, our models have the ability to generalize to other healthcare systems as we rely only on components commonly 
found in electronic patient data. 

The work demonstrates the potential of leveraging EMRs to aid medical professionals with difficult tasks, especially with 
early disease diagnosis. Future work will focus on incorporating temporal components, such as time of diagnosis, into the 
model, decreasing the false positive rate, and integrating a larger control population. 
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